Goto

Collaborating Authors

 necessary and sufficient geometry


Necessary and Sufficient Geometries for Gradient Methods

Neural Information Processing Systems

We study the impact of the constraint set and gradient geometry on the convergence of online and stochastic methods for convex optimization, providing a characterization of the geometries for which stochastic gradient and adaptive gradient methods are (minimax) optimal. In particular, we show that when the constraint set is quadratically convex, diagonally pre-conditioned stochastic gradient methods are minimax optimal. We further provide a converse that shows that when the constraints are not quadratically convex---for example, any $\ell_p$-ball for $p < 2$---the methods are far from optimal. Based on this, we can provide concrete recommendations for when one should use adaptive, mirror or stochastic gradient methods.


Reviews: Necessary and Sufficient Geometries for Gradient Methods

Neural Information Processing Systems

POST-AUTHOR FEEDBACK: I thank the authors for their explanations, which addresses my concerns. I look forward to seeing the camera-ready version with improved results and the clarifications. The paper is well-written, the technical results are precise and correct as far as I can tell, and the results are novel and interesting. However, I am slightly puzzled over a few points, some of which pertain to the interpretation of the results. I believe the paper would significantly benefit from clarifying the following points: * The authors refer to mirror descent with (fixed) weighted Euclidean distance as an adaptive gradient algorithm, but this connection is not clear to me.


Necessary and Sufficient Geometries for Gradient Methods

Neural Information Processing Systems

We study the impact of the constraint set and gradient geometry on the convergence of online and stochastic methods for convex optimization, providing a characterization of the geometries for which stochastic gradient and adaptive gradient methods are (minimax) optimal. In particular, we show that when the constraint set is quadratically convex, diagonally pre-conditioned stochastic gradient methods are minimax optimal. We further provide a converse that shows that when the constraints are not quadratically convex---for example, any \ell_p -ball for p 2 ---the methods are far from optimal. Based on this, we can provide concrete recommendations for when one should use adaptive, mirror or stochastic gradient methods.